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A PISE impiernentation of dynamic code decompression 
Marc L. Corliss, E. Christopher Lewis, Amir Roth 

June 2003 ACM SIGPLAN Notices , Proceedings of the 2003 ACM SIGPLAN conference 

on Language, compiler, and tool for embedded systems, volume 38 issue 7 
Full text available: ■^pdtV291,52.KB) Additional Information: Ml-CjiatJon, abstract, references, jndex.teri-ns 

Code compression coupled with dynamic decompression is an important technique for both 
embedded and general-purpose microprocessors. Post-fetch decompression, in which 
decompression is performed after the compressed instructions have been fetched, allows the 
instruction cache to store compressed code but requires a highly efficient decompression 
implementation. We propose implementing post-fetch decompression using dynamic 
instruction stream editing (DISE), a programmable decoder— ... 
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Andrzej Szymczak, Jarek Rosslgnac 

June 1999 Proceedings of the fifth ACM symposium on Solid modeling and applications 

Full text available: j::'dfti.35 fvlB) Additional Information: fijil citation , references , dtlnas , index terrris 



^ DISE: a programmable macro engine for customizing applications B 
Marc L. Corliss, E. Christopher Lewis, Amir Roth 

May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture, volume 3i issue 2 
Full text available: '^.pdl(335,30..K^^ Additional Information: MLpjtalion, abstract, refereaGes 

Dynamic Instruction Stream Editing (DISE) is a cooperative software-hardware scheme 
for efficiently adding customization functionality— e.g, safety/security checking, profiling, 
dynamic code decompression, and dynamic optimization— to an application. In DISE, 
application customization functions (ACFs) are formulated as rules for macro-expanding 
certain instructions into parameterized instruction sequences. The processor executes the 
rules on the fetched instructions, feeding the executi ... 



4 Tool support for architectural decisions in embedded systems: CoCo: a 

hardware/software platform for rapid prototyping of code compression technologies 
Maris Lekatsas, Jorg Henkel, Srimat Chakradhar, Venkata Jakkula, Murugan Sankaradass 
June 2003 Proceedings of the 40th conference on Design automation 

Full text available: *^pdtV243J0„KBJ Additional Information: MLQitation, abstract, relejences, indexjerms 
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In recent years instruction code compression/decompression technologies have emerged as 
an efficient way to a) reduce the memory usage of an embedded system, b) to improve 
performance through effectively higher bandwidths and/or to c) reduce the overall power 
consumption of a system processing compressed code. We have presented efficient code 
compression/decompression techniques and architectures in the past. For the 
commercialization phase, we designed a novel hardware/software code compression ... 

Keywords: code compression, embedded systems 
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A digital presentation technology that manages anything from text to full-motion video has 
the potential of expanding the usefulness of personal computers, while rendering them less 
intimidating. 

Techniques for obtaining high performance in Java programs 
Iffat H. Kazi, Howard H. Chen, Berdenia Stanley, David J. Lilja 
September 2000 ACM Computing Surveys (CSUR), volume 32 issue 3 

Additional Information: Ml.cMiQn, abstract, references, dtjngs, .index 



Full text available: i^pdfiSI 6. 13 KB) 

^ terms 

This survey describes research directions in techniques to improve the performance of 
programs written in the Java programming language. The standard technique for Java 
execution is Interpretation, which provides for extensive portability of programs. A Java 
interpreter dynamically executes Java bytecodes, which comprise the instruction set of the 
Java Virtual Machine (JVM). Execution time performance of Java programs can be improved 
through compilation, possibly at the expense of portabili ... 

Keywords: Java, Java virtual machine, bytecode-to-source translators, direct compilers, 
dynamic compilation, interpreters, just-in-time compilers 



7 Static load classification for improving the value predictabilitv of data-cache misses B 
Martin Burtscher, Amer Diwan, Matthias Hauswirth 

May 2002 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2002 Conference 

on Programming language design and implementation, volume 3? issue 5 
Full text available: '^,^4tL2I3.'2§..}!>Bl Additional Information: fylLQitation, abstrM, references, index ternis 

While caches are effective at avoiding most main- memory accesses, the few remaining 
memory references are still expensive. Even one cache miss per one hundred accesses can 
double a program's execution time. To better tolerate the data-cache miss latency, 
architects have proposed various speculation mechanisms, including load-value prediction. A 
load-value predictor guesses the result of a load so that the dependent instructions can 
immediately proceed without having to wait for the memory access ... 

Keywords: load-value prediction, type-based analysis 
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May 2000 ACM Transactions on Programming Languages and Systems (TOPLAS), 
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A program executing on a low-end embedded system, such as a smart-card, faces scarce 
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memory resources and fixed execution time constraints. We demonstrate that factorization 
of common instruction sequences in Java bytecode allows the memory footprint to be 
reduced, on average, to 85% of its original size, with a minimal execution time penalty. 
While preserving Java compatibility, our solution requires only a few modifications which are 
straightforward to implement in any JVM used in a low-e ... 

Keywords: Java bytecode, code compression, embedded systems 



^ Ib&-SimpieScala.rjoo^ 
Doug Burger, Todd M. Austin 

June 1997 ACM SIGARCH Computer Architecture News, volume 2S issue 3 

Full text available: odf^ 935.43 KB) Additional Information: fitll ciiaiion. abstf gict . citlnot?. index terms 



This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly 
available simulation tools that offer both detailed and high-performance simulation of 
modern microprocessors. The new release offers more tools and capabilities, precompiled 
binaries, cleaner interfaces, better documentation, easier installation, improved portability, 
and higher performance. This paper contains a complete description of the tool set, Including 
retrieval and installation instructions, a d ... 

^ ^ Compilation .aod.run4 i 
Giuseppe Desoii, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, Joseph A. Fisher 
November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: ^ ^7 ^^^g^ ^ Additional Information: .M.PMiQ.a, abstract, references, dtjng.s, index 

Publisher Site '^^^ 

The Dynamic Execution Layer Interface (DELI) offers the following unique capability: it 
provides fine-grain control over the execution of programs, by allowing its clients to observe 
and optionally manipulate every single instruction— at run time— -just before it runs. DELI 
accomplishes this by opening up an interface to the layer between the execution of software 
and hardware. To avoid the slowdown, DELI caches a private copy of the executed code and 
always runs out of its own private cache. In ... 
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Lei Li, Krishnendu Chakrabarty, Nur A. Touba 

October 2003 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 8 Issue 4 
Full text available: *^|;\d£59A.A6.KBj Additional Information: Ml.cjtatjpn, al?stract, references, jndex.terrris 

We present a dictionary-based test data compression approach for reducing test data 
volume in SOCs. The proposed method is based on the use of a small number of ATE 
channels to deliver compressed test patterns from the tester to the chip and to drive a large 
number of internal scan chains in the circuit under test. Therefore, it is especially suitable 
for a reduced pin-count and low-cost DFT test environment, where a narrow interface 
between the tester and the SOC is desirable. The dictionary-ba ... 

Keywords: Embedded core testing, SoC testing, reduced pin-count testing, test application 
time, test data volume 
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During the last 15 years, embedded systems have grown in complexity and performance to 
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rival desktop systems. The architectures of these systems present unique challenges to 
processor microarchitecture, including instruction encoding and instruction fetch processes. 
This paper presents new techniques for reducing embedded system code size without 
reducing functionality. This approach is to extract the pipeline decoder logic for an 
embedded VLIW processor in software at system develo ... 

13 The ULTRAVIS system 
Gunter Knittel 

October 2000 Proceedings of the 2000 IEEE symposium on Volume visualization 

Full text available: ■^pdfOJ2375„KBj Additional Information: Ml-?iMion, reMences, citincs^ i.nciex texD^ls 
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Shivajit Mohapatra, Radu Cornea, Nikil Dutt, Alex Nicolau, Nalini Venkatasubramanian 
November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: :xif(417.95 KB) Additional Information: fiiil citation . abstrg;Ct . references , index terrris 

Optinnizing user experience for streaming video applications on liandheld devices is a 
significant researcli challenge. In this paper, we propose an integrated power management 
approach that unifies low level architectural optimizations (CPU, memory, register), OS 
power-saving mechanisms (Dynamic Voltage Scaling) and adaptive middleware techniques 
(admission control, optimal transcoding, network traffic regulation). Specifically, we identify 
interaction parameters between the different levels and o ... 

Keywords: cross-layer adaptation, low-power, multimedia streaming 



increasing effective IPC by exploiting distant paralleiisfrt 
Ivan Martel, Daniel Ortega, Eduard Ayguade, Mateo Valero 

May 1999 Proceedings of the 13th international conference on Supercomputing 
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Ben Juurlink, Pepijn de Langen 

April 2004 Proceedings of the first conference on computing frontiers on Computing 
frontiers 

Full text available: !::-dft255.88 KB) Additional Information: full citation , cibslrad , references , index terms 

Memory transfers, in particular from/to off-chip memories, consume a significant amount of 
power. In order to reduce the amount of off-chip memory traffic, one or more levels of 
cache can be employed, located on the same die as the processor core. For performance, 
energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. 
Small, direct-mapped caches, however, generally produce much more traffic than needed. 
The purpose of this paper is two-fold. First, to measur ... 
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Keywords: caches, embedded processors, memory traffic, power consumption 



1^ Understanding and improving operating system effects in control flow prediction 
Tao Li, Lizy Kurian John, Anand Sivasubramaniam, N. Vijaykrishnan, Juan Rubio 
October 2002 Tenth international conference on architectural support for programming 
languages and operating systems on Proceedings of the 10th 
international conference on architectural support for programming 
languages and operating systems (ASPLOS-X), volume 37 , 30 , 36 issue lo , 5 , s 
Full text available: ' g-)cjft1.50 MB) Additional Information: full cjt;ation . abstract, references. c:ttnas 

Many modern applications result in a significant operating system (OS) component. The OS 
component has several implications including afi'ecting the control flow transfer in the 
execution environment. This paper focuses on understanding the operating system effects 
on control flow transfer and prediction, and designing architectural support to alleviate the 
bottlenecks. We characterize the control flow transfer of several emerging applications on a 
commercial operating system. We find that the ex ... 

19 Profile-based dynamic voltage and frequency scaiing for a multiple ciock domain 
niicroprocessor 

Grigorios Magklis, Michael L. Scott, Greg Semeraro, David H. Albonesi, Steven Dropsho 
May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture, voiunne 3i issue 2 
Full text available: ' ^pdf(541.83 KB) Additional Information: hiW ciiation . abstr^ict . references 

A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and 
power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing 
frequency and voltage to be reduced in domains that are not currently on the application's 
critical path. Given a reconfiguration mechanism capable of choosing appropriate times and 
values for voltage/frequency scaling, an MCD processor has the potential to achieve 
significant energy savings with low performance degr ... 
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Jeremy Lau, Stefan Schoenmackers, Timothy Sherwood, Brad Calder 

October 2003 Proceedings of the international conference on Compilers, architectures 
and synthesis for embedded systems 

Full text available: *Q.pdf(228,45 Kg ! Additional Information: fujicitatipn, abstract, refereaces, index terilis 

In an embedded system, the cost of storing a program on-chip can be as high as the cost of 
a microprocessor. Compressing an application's code to reduce the amount of memory 
required is an attractive way to decrease costs. In this paper, we examine an executable 
form of program compression using echo instructions. With echo instructions, two or more 
similar, but not necessarily identical, sections of code can be reduced to a single copy of the 
repeating code. The single copy is left in the ... 

Keywords: code size optimization, compression, echo instructions 
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